library(tidycensus)
library(tidyverse)
census_api_key("YOUR API KEY GOES HERE")

Inspect variables for Decennial Census

decennial_variables<-load_variables(2010,"sf1")
View(decennial_variables)
name label concept
H001001 Total HOUSING UNITS
H002001 Total URBAN AND RURAL
H002002 Total!!Urban URBAN AND RURAL
H002003 Total!!Urban!!Inside urbanized areas URBAN AND RURAL
H002004 Total!!Urban!!Inside urban clusters URBAN AND RURAL
H002005 Total!!Rural URBAN AND RURAL

Calling and Manipulating Data using tidycensus and dplyr: Basics

Calling and Viewing Data

Let’s find out the population, by state, in 2010:

state_population_2010<-get_decennial(geography = "state", 
                                     variables = "P001001", 
                                     geometry=TRUE,
                                     shift_geo = TRUE,
                                     year = 2010)
View(state_population_2010)
GEOID NAME variable value geometry
04 Arizona P001001 6392017 MULTIPOLYGON (((-1111066 -8…
05 Arkansas P001001 2915918 MULTIPOLYGON (((557903.1 -1…
06 California P001001 37253956 MULTIPOLYGON (((-1853480 -9…
08 Colorado P001001 5029196 MULTIPOLYGON (((-613452.9 -…
09 Connecticut P001001 3574097 MULTIPOLYGON (((2226838 519…
11 District of Columbia P001001 601723 MULTIPOLYGON (((1960720 -41…

We can adjust the geography and year parameters; let’s say we want the population distribution across CO counties in the year 2010:

CO_county_population_2010<-get_decennial(geography = "county", 
                                         state="CO",
                                         variables = "P001001", 
                                         year = 2010)
View(CO_county_population_2010)
GEOID NAME variable value
08023 Costilla County, Colorado P001001 3524
08025 Crowley County, Colorado P001001 5823
08027 Custer County, Colorado P001001 4255
08029 Delta County, Colorado P001001 30952
08031 Denver County, Colorado P001001 600158
08035 Douglas County, Colorado P001001 285465

Cleaning and Manipulating Data Using dplyr

Let’s clean up the table by removing the “variable” column, and renaming the “value” column as “Population”, which we can do using functions from the “dplyr” package.

CO_county_population_2010<-get_decennial(geography = "county", 
                                         state="CO",
                                         variables = "P001001", 
                                         year = 2000) %>% 
                           mutate(variable=NULL) %>% 
                           rename(population=value)

View(CO_county_population_2010)
GEOID NAME population
08001 Adams County, Colorado 363857
08003 Alamosa County, Colorado 14966
08005 Arapahoe County, Colorado 487967
08007 Archuleta County, Colorado 9898
08009 Baca County, Colorado 4517
08011 Bent County, Colorado 5998

It’s also possible to call multiple variables into a single table. To see this, let’s add a field/column containing the rural population in each state in 2010 (as well as the total population in that year), and order the dataset in descending order with respect to the rural population (such that the state with the largest rural population will appear as the first record in the dataset):

state_pop_ruralpop_2010<-get_decennial(geography = "state", 
                                          variables = c("P001001", "P002005"),
                                          output="wide",
                                          year = 2010) %>% 
                         rename(total_population=P001001, rural_population=P002005) %>% 
                         arrange(desc(rural_population))

state_pop_ruralpop_2010
GEOID NAME total_population rural_population
48 Texas 25145561 3847522
37 North Carolina 9535483 3233727
42 Pennsylvania 12702379 2711092
39 Ohio 11536504 2546810
26 Michigan 9883640 2513683
13 Georgia 9687653 2415502

Let’s generate a new variable based on the variables we already have in the dataset. This variable will measure the percentage of each state’s population that are rural residents (calculated by dividing the rural population by the total population, and multiplying by 100). We’ll call this variable “rural_pct”. We’ll also resort the dataset, so that it’s sorted in descending order with respect to the new “rural_pct” variable, instead of the actual number of rural residents:

state_pop_ruralpop_2010<-
  state_pop_ruralpop_2010 %>% mutate(rural_pct=(rural_population/total_population)*100) %>% 
                              arrange(desc(rural_pct))
  
View(state_pop_ruralpop_2010)
GEOID NAME total_population rural_population rural_pct
23 Maine 1328361 814819 61.34018
50 Vermont 625741 382356 61.10451
54 West Virginia 1852994 950184 51.27831
28 Mississippi 2967297 1503073 50.65462
30 Montana 989415 436401 44.10697
05 Arkansas 2915918 1278329 43.83968

The dplyr package also makes it easy to filter datasets based on specific criteria, which we can then assign to a new object. For example, let’s say that we want to generate a new dataset that only includes states whose rural populations are greater than 40% of their overall populations. We’ll assign this new dataset to an object called “rural_pct_over40”:

rural_pct_over40<-state_pop_ruralpop_2010 %>% filter(rural_pct>40)
View(rural_pct_over40)
GEOID NAME total_population rural_population rural_pct
23 Maine 1328361 814819 61.34018
50 Vermont 625741 382356 61.10451
54 West Virginia 1852994 950184 51.27831
28 Mississippi 2967297 1503073 50.65462
30 Montana 989415 436401 44.10697
05 Arkansas 2915918 1278329 43.83968
46 South Dakota 814180 352933 43.34828
21 Kentucky 4339367 1806024 41.61953
01 Alabama 4779736 1957932 40.96318
38 North Dakota 672591 269719 40.10149

Student Exercise

Create a dataset of Colorado counties that had a rural population that exceeded 50% of the county’s overall population in 2010, and sort the dataset in descending order with respect to the field containing information on the percentage of the county’s rural resident’s. Your final dataset should look something like this:

GEOID NAME total_population rural_population rural_pct
08023 Costilla County, Colorado 3524 3524 100.00000
08025 Crowley County, Colorado 5823 5823 100.00000
08027 Custer County, Colorado 4255 4255 100.00000
08033 Dolores County, Colorado 2064 2064 100.00000
08039 Elbert County, Colorado 23086 23086 100.00000
08047 Gilpin County, Colorado 5441 5441 100.00000
08053 Hinsdale County, Colorado 843 843 100.00000
08057 Jackson County, Colorado 1394 1394 100.00000
08061 Kiowa County, Colorado 1398 1398 100.00000
08073 Lincoln County, Colorado 5467 5467 100.00000
08079 Mineral County, Colorado 712 712 100.00000
08091 Ouray County, Colorado 4436 4436 100.00000
08093 Park County, Colorado 16206 16206 100.00000
08095 Phillips County, Colorado 4442 4442 100.00000
08111 San Juan County, Colorado 699 699 100.00000
08109 Saguache County, Colorado 6108 6108 100.00000
08103 Rio Blanco County, Colorado 6666 6666 100.00000
08113 San Miguel County, Colorado 7359 7359 100.00000
08115 Sedgwick County, Colorado 2379 2379 100.00000
08121 Washington County, Colorado 4814 4814 100.00000
08009 Baca County, Colorado 3788 3788 100.00000
08021 Conejos County, Colorado 8256 8256 100.00000
08017 Cheyenne County, Colorado 1836 1836 100.00000
08019 Clear Creek County, Colorado 9088 9088 100.00000
08049 Grand County, Colorado 14843 12260 82.59786
08083 Montezuma County, Colorado 25535 17155 67.18230
08125 Yuma County, Colorado 10043 6519 64.91088
08029 Delta County, Colorado 30952 19553 63.17201
08119 Teller County, Colorado 23350 14618 62.60385
08105 Rio Grande County, Colorado 11982 7493 62.53547
08067 La Plata County, Colorado 51334 30774 59.94857
08007 Archuleta County, Colorado 12084 7175 59.37603
08051 Gunnison County, Colorado 15324 8981 58.60741
08055 Huerfano County, Colorado 6711 3768 56.14662

##More advanced data wrangling

##Iteration, Temporal Dynamics, and Exploratory Visualization

Let’s

my_years<-c(2000,2010)
population_rural_2000_2010<-map(
  my_years,
  ~(get_decennial(geography = "state", 
                  variables = c("P001001", "P002005"),
                  output="wide",
                  year =.)) %>% 
    mutate(rural_pct=(P002005/P001001)*100) %>% 
    arrange(NAME)
)
## Getting data from the 2000 decennial Census
## Using Census Summary File 1
## Getting data from the 2010 decennial Census
## Using Census Summary File 1
names(population_rural_2000_2010)<-my_years

rural_change<-full_join(population_rural_2000_2010[["2000"]],
                        population_rural_2000_2010[["2010"]],by="NAME") %>%  
              mutate(rural_pct_change=rural_pct.y-rural_pct.x) %>% 
              select(NAME,rural_pct_change)
rural_change
## # A tibble: 52 x 2
##    NAME                 rural_pct_change
##    <chr>                           <dbl>
##  1 Alabama                        -3.59 
##  2 Alaska                         -0.421
##  3 Arizona                        -1.64 
##  4 Arkansas                       -3.64 
##  5 California                     -0.509
##  6 Colorado                       -1.68 
##  7 Connecticut                    -0.252
##  8 Delaware                       -3.18 
##  9 District of Columbia            0    
## 10 Florida                        -1.88 
## # … with 42 more rows
basegraph<-rural_change %>%
  ggplot(aes(x = reorder(NAME,rural_pct_change), y=rural_pct_change)) + 
  geom_col()+
  coord_flip()

basegraph+labs(title="Rural Depopulation", x="State Name", y="Pct Change in Rural Population")+
  theme(plot.title=element_text(hjust=0.5))

rural_depop_tomap<-full_join(state_population_2010,rural_change,by="NAME")

foundational_map<-tm_shape(rural_depop_tomap)+
  tm_polygons(col="rural_pct_change", n=6,style="jenks",palette="BuGn", midpoint=TRUE)

foundational_map
## Warning: The shape rural_depop_tomap contains empty units.

##custom breaks and title
revised_map<-tm_shape(rural_depop_tomap)+
  tm_polygons(col="rural_pct_change", breaks=c(-6,-4,-2, 0, 1, 2),palette="YlGnBu", midpoint=TRUE)+
  tm_layout(frame=FALSE, main.title="Percentage Point Change\nin Rural Population, By State",  
              main.title.position="left", legend.outside=TRUE)

revised_map
## Warning: The shape rural_depop_tomap contains empty units.

Student Visualization Practice

Practice visualizing Census by doing ONE of the following: 1) make a map (using the tmap package) that shows county-level variation in the median age across the state of Colorado or 2) make a visualization (using the ggplot package) of state-level variation in the median age across the entire United States.

Option 1 Code

median_age_CO<- get_decennial(geography = "county",
                              state="CO",
                              variables = "P013001", 
                              year = 2010,
                              geometry = TRUE) %>% 
                rename(median_age=value) %>% 
                relocate(NAME)
median_age_CO_map<-tm_shape(median_age_CO)+
                   tm_polygons(col="median_age",breaks=c(30,35,40,45,50),palette="YlGnBu", midpoint=TRUE)+
                   tm_layout(frame=FALSE, main.title="Median Age by County,\nColorado",  
                   main.title.position="left", legend.outside=TRUE)

median_age_CO_map

Making a Web Map

tmap_mode("view")
median_age_CO_map

Option 2 Code

median_age_CO_visualization<-
  median_age_CO %>%
  ggplot(aes(x = median_age, y = reorder(NAME, median_age))) + 
  geom_point()+
  labs(title="Median Age by County, CO", x="Median Age", y="County Name")+
  theme(plot.title=element_text(hjust=0.5))

median_age_CO_visualization

median_age_CO_cleaned<-median_age_CO %>% 
                       mutate(County_Name=str_remove_all(NAME,"Colorado|,|County"))

median_age_CO_cleaned_visualization<-
  median_age_CO_cleaned %>%
  ggplot(aes(x = median_age, y = reorder(County_Name, median_age))) + 
  geom_point()+
  labs(title="Median Age by County, CO", x="Median Age", y="County")+
  theme(plot.title=element_text(hjust=0.5))

median_age_CO_cleaned_visualization

American Community Survey

Calling ACS Data

To inspect the variable list for the ACS, use the “load variables” function. Let’s say we want to work with the 5-year ACS ending in 2019:

ACS_5_2019<-load_variables(2019,"acs5")
View(ACS_5_2019)
name label concept
B01001_001 Estimate!!Total: SEX BY AGE
B01001_002 Estimate!!Total:!!Male: SEX BY AGE
B01001_003 Estimate!!Total:!!Male:!!Under 5 years SEX BY AGE
B01001_004 Estimate!!Total:!!Male:!!5 to 9 years SEX BY AGE
B01001_005 Estimate!!Total:!!Male:!!10 to 14 years SEX BY AGE
B01001_006 Estimate!!Total:!!Male:!!15 to 17 years SEX BY AGE

Let’s issue a call to the API and generate a table that gives us the median-income of the United States by county. We may want to eventually have the option of mapping this data, so we’ll set the geometry parameter equal to TRUE. Note that when using the “get_acs” function call, the default setting will return data from the 5-year ACS that terminates in the specified year (i.e. if the year parameter is set to 2019, the function will return the 2015-2019 ACS). If we want to call the 1 year or 3 year ACS, the “survey” argument of the “get_acs” function could be set to “acs1” or “acs3”, depending on which survey we are interested in calling.

median_income<-get_acs(geography="county",
                       variables="B19013_001",
                       year=2019,
                       geometry=TRUE,
                       shift_geo=TRUE) %>% 
              rename(median_income=estimate) %>% 
              arrange(desc(median_income))
               
View(median_income)
GEOID NAME variable median_income moe geometry
51107 Loudoun County, Virginia B19013_001 142299 2089 MULTIPOLYGON (((1906352 -41…
51610 Falls Church city, Virginia B19013_001 127610 16144 MULTIPOLYGON (((1949973 -40…
51059 Fairfax County, Virginia B19013_001 124831 1281 MULTIPOLYGON (((1956587 -41…
06085 Santa Clara County, California B19013_001 124055 1117 MULTIPOLYGON (((-1902539 -6…
06081 San Mateo County, California B19013_001 122641 1680 MULTIPOLYGON (((-1952774 -6…
35028 Los Alamos County, New Mexico B19013_001 121324 4613 MULTIPOLYGON (((-565023.3 -…

Manipulating and Visualizing ACS Data: dplyr’s “group_by” and “slice” functions, and visualizing uncertainty using ggplot

Let’s say that we want to generate a table that contains the highest median-income county for each state. To do so, we will use dplyr’s “group_by” and “slice” functions, after separating out the “Name” field in the existing table (which is in the form “County Name, State”) into separate “County” and “State” fields:

highest_income_counties<-median_income %>% 
  separate(NAME,c("County","State"),sep=",") %>% 
  group_by(State) %>% 
  arrange(desc(median_income)) %>% 
  slice(1) %>% 
  unite(NAME, c("County","State"), remove=FALSE, sep=",")

View(highest_income_counties)
kable(highest_income_counties) 
GEOID NAME County State variable median_income moe geometry
01117 Shelby County, Alabama Shelby County Alabama B19013_001 77799 2248 MULTIPOLYGON (((1207383 -12…
02110 Juneau City and Borough, Alaska Juneau City and Borough Alaska B19013_001 88390 4059 MULTIPOLYGON (((-780061.4 -…
04013 Maricopa County, Arizona Maricopa County Arizona B19013_001 64468 326 MULTIPOLYGON (((-1075618 -1…
05007 Benton County, Arkansas Benton County Arkansas B19013_001 66362 1292 MULTIPOLYGON (((490499.7 -9…
06085 Santa Clara County, California Santa Clara County California B19013_001 124055 1117 MULTIPOLYGON (((-1902539 -6…
08035 Douglas County, Colorado Douglas County Colorado B19013_001 119730 1710 MULTIPOLYGON (((-432929.3 -…
09001 Fairfield County, Connecticut Fairfield County Connecticut B19013_001 95645 1039 MULTIPOLYGON (((2153259 -17…
10003 New Castle County, Delaware New Castle County Delaware B19013_001 73892 1210 MULTIPOLYGON (((2058738 -29…
11001 District of Columbia, District of Columbia District of Columbia District of Columbia B19013_001 86420 1008 MULTIPOLYGON (((1960720 -41…
12109 St. Johns County, Florida St. Johns County Florida B19013_001 82252 2741 MULTIPOLYGON (((1768025 -14…
13117 Forsyth County, Georgia Forsyth County Georgia B19013_001 107218 2004 MULTIPOLYGON (((1441485 -10…
15003 Honolulu County, Hawaii Honolulu County Hawaii B19013_001 85857 907 MULTIPOLYGON (((-481727.5 -…
16081 Teton County, Idaho Teton County Idaho B19013_001 74216 3576 MULTIPOLYGON (((-880601 -53…
17093 Kendall County, Illinois Kendall County Illinois B19013_001 96563 4721 MULTIPOLYGON (((970466.8 -2…
18057 Hamilton County, Indiana Hamilton County Indiana B19013_001 98173 2249 MULTIPOLYGON (((1173473 -46…
19049 Dallas County, Iowa Dallas County Iowa B19013_001 88479 3234 MULTIPOLYGON (((473355.5 -3…
20091 Johnson County, Kansas Johnson County Kansas B19013_001 89087 998 MULTIPOLYGON (((467219 -668…
21185 Oldham County, Kentucky Oldham County Kentucky B19013_001 99128 3974 MULTIPOLYGON (((1262586 -63…
22005 Ascension Parish, Louisiana Ascension Parish Louisiana B19013_001 80527 3017 MULTIPOLYGON (((861410.4 -1…
23005 Cumberland County, Maine Cumberland County Maine B19013_001 73072 1427 MULTIPOLYGON (((2332898 298…
24027 Howard County, Maryland Howard County Maryland B19013_001 121160 2169 MULTIPOLYGON (((1938398 -36…
25019 Nantucket County, Massachusetts Nantucket County Massachusetts B19013_001 107717 5735 MULTIPOLYGON (((2431244 400…
26093 Livingston County, Michigan Livingston County Michigan B19013_001 84221 1674 MULTIPOLYGON (((1320579 -11…
27139 Scott County, Minnesota Scott County Minnesota B19013_001 102152 3021 MULTIPOLYGON (((525638.1 -1…
28089 Madison County, Mississippi Madison County Mississippi B19013_001 71824 2728 MULTIPOLYGON (((961529.6 -1…
29183 St. Charles County, Missouri St. Charles County Missouri B19013_001 84978 1195 MULTIPOLYGON (((817474.5 -6…
30043 Jefferson County, Montana Jefferson County Montana B19013_001 69646 4258 MULTIPOLYGON (((-899443.3 1…
31153 Sarpy County, Nebraska Sarpy County Nebraska B19013_001 82032 1552 MULTIPOLYGON (((341452 -429…
32015 Lander County, Nevada Lander County Nevada B19013_001 88030 21398 MULTIPOLYGON (((-1450087 -3…
33015 Rockingham County, New Hampshire Rockingham County New Hampshire B19013_001 93756 1893 MULTIPOLYGON (((2305592 212…
34027 Morris County, New Jersey Morris County New Jersey B19013_001 115527 1813 MULTIPOLYGON (((2080464 -13…
35028 Los Alamos County, New Mexico Los Alamos County New Mexico B19013_001 121324 4613 MULTIPOLYGON (((-565023.3 -…
36059 Nassau County, New York Nassau County New York B19013_001 116100 1093 MULTIPOLYGON (((2171497 -13…
37183 Wake County, North Carolina Wake County North Carolina B19013_001 80591 822 MULTIPOLYGON (((1940120 -76…
38105 Williams County, North Dakota Williams County North Dakota B19013_001 87161 7443 MULTIPOLYGON (((-298796 384…
39041 Delaware County, Ohio Delaware County Ohio B19013_001 106908 2786 MULTIPOLYGON (((1451197 -37…
40017 Canadian County, Oklahoma Canadian County Oklahoma B19013_001 72056 1690 MULTIPOLYGON (((211597.9 -1…
41067 Washington County, Oregon Washington County Oregon B19013_001 82215 997 MULTIPOLYGON (((-1743904 30…
42029 Chester County, Pennsylvania Chester County Pennsylvania B19013_001 100214 1232 MULTIPOLYGON (((2060174 -23…
44009 Washington County, Rhode Island Washington County Rhode Island B19013_001 85531 2042 MULTIPOLYGON (((2319619 -14…
45013 Beaufort County, South Carolina Beaufort County South Carolina B19013_001 68377 1987 MULTIPOLYGON (((1785885 -11…
46083 Lincoln County, South Dakota Lincoln County South Dakota B19013_001 82473 2951 MULTIPOLYGON (((276504.4 -1…
47187 Williamson County, Tennessee Williamson County Tennessee B19013_001 112962 2976 MULTIPOLYGON (((1150429 -92…
48397 Rockwall County, Texas Rockwall County Texas B19013_001 100920 4011 MULTIPOLYGON (((327048.3 -1…
49043 Summit County, Utah Summit County Utah B19013_001 102958 5613 MULTIPOLYGON (((-920528.2 -…
50007 Chittenden County, Vermont Chittenden County Vermont B19013_001 73647 2249 MULTIPOLYGON (((2092685 314…
51107 Loudoun County, Virginia Loudoun County Virginia B19013_001 142299 2089 MULTIPOLYGON (((1906352 -41…
53033 King County, Washington King County Washington B19013_001 94974 726 MULTIPOLYGON (((-1661250 51…
54037 Jefferson County, West Virginia Jefferson County West Virginia B19013_001 80430 3750 MULTIPOLYGON (((1878385 -35…
55133 Waukesha County, Wisconsin Waukesha County Wisconsin B19013_001 87277 1110 MULTIPOLYGON (((929187.5 -1…
56039 Teton County, Wyoming Teton County Wyoming B19013_001 84678 8230 MULTIPOLYGON (((-879496.6 -…

If we want to visualize this information, we can incorporate the MOE for these estimates into the visualization, so that we can convey the uncertainty surrounding these median income estimates.

highest_income_counties_viz<-highest_income_counties %>% 
                             ggplot(aes(x=median_income,y=reorder(NAME, median_income)))+
                             geom_errorbarh(aes(xmin = median_income - moe, xmax = median_income + moe)) +
                             geom_point(color = "red", size = 3)+
                                        labs(title="County with Highest Median Income, by State",
                                        y="",
                                        x="Median Income Estimate from ACS (bars indicate margin of error)")
highest_income_counties_viz                                    

student exercise where they do same for the lowest

correlation between income and election results

correlation between health insurance and covid

across the state of Colorado in the year 2010 OR 2) A visualization of county-level

Additional Work with Dplyr and Visualization

##county median age map?

Appendix

my_years<-c(2000,2010)
population_rural_2000_2010<-map(
  my_years,
  ~(get_decennial(geography = "state", 
                 variables = c("P001001", "P002005"),
                 output="wide",
                 year =.)) %>% 
      mutate(rural_pct=(P002005/P001001)*100) %>% 
      arrange(NAME)
  )
## Getting data from the 2000 decennial Census
## Using Census Summary File 1
## Getting data from the 2010 decennial Census
## Using Census Summary File 1
names(population_rural_2000_2010)<-my_years

joined_ds<-full_join(population_rural_2000_2010[["2000"]],population_rural_2000_2010[["2010"]],by="NAME") %>% 
           mutate(pct_change=rural_pct.y-rural_pct.x) %>% 
           select(NAME,pct_change)
joined_ds
## # A tibble: 52 x 2
##    NAME                 pct_change
##    <chr>                     <dbl>
##  1 Alabama                  -3.59 
##  2 Alaska                   -0.421
##  3 Arizona                  -1.64 
##  4 Arkansas                 -3.64 
##  5 California               -0.509
##  6 Colorado                 -1.68 
##  7 Connecticut              -0.252
##  8 Delaware                 -3.18 
##  9 District of Columbia      0    
## 10 Florida                  -1.88 
## # … with 42 more rows